Skip to content

Auto-resolve Apple Mail V10 accounts via Accounts4.sqlite#172

Open
schlabrendorff wants to merge 2 commits intowesm:mainfrom
schlabrendorff:feature/apple-mail-auto-accounts
Open

Auto-resolve Apple Mail V10 accounts via Accounts4.sqlite#172
schlabrendorff wants to merge 2 commits intowesm:mainfrom
schlabrendorff:feature/apple-mail-auto-accounts

Conversation

@schlabrendorff
Copy link

@schlabrendorff schlabrendorff commented Mar 4, 2026

Builds upon PR #166

Motivation

The previous import-emlx command required users to manually specify both an email identifier and a mail directory:

msgvault import-emlx me@gmail.com ~/Library/Mail/V10/AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE

Especially for importing V10 folder (#157) this is suboptimal.

This was painful for several reasons:

  1. Opaque GUIDs: Apple Mail V10 stores each account under a UUID directory (e.g. AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE). There's no obvious way to know which GUID belongs to which email account without digging into Apple's internal databases.
  2. One account at a time: Users with multiple accounts (Gmail, Yahoo, iCloud, Exchange) had to run the command once per account, manually mapping each GUID.
  3. Error-prone: Getting the GUID-to-email mapping wrong means messages get filed under the wrong source, which is hard to fix after the fact.

Apple already maintains this mapping in ~/Library/Accounts/Accounts4.sqlite. By reading it, we can auto-discover all accounts and import them in one shot.

Summary

  • Add internal/applemail package that reads ~/Library/Accounts/Accounts4.sqlite to map V10 directory GUIDs to email addresses, resolving child→parent account relationships (IMAP, Exchange, "On My Mac")
  • Rework import-emlx command from import-emlx <email> <mail-dir> to import-emlx [mail-dir] with auto-discovery — users no longer need to manually figure out which GUID maps to which email
  • Add --accounts-db, --account (repeatable filter), and --identifier (manual fallback) flags
  • Export emlx.IsUUID() for reuse across packages

New UX

Before — one account at a time, manual GUID mapping:

msgvault import-emlx me@gmail.com ~/Library/Mail/V10/AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
msgvault import-emlx me@yahoo.com ~/Library/Mail/V10/11111111-2222-3333-4444-555555555555
msgvault import-emlx me@icloud.com ~/Library/Mail/V10/FFFFFFFF-0000-1111-2222-333333333333
# ... repeat for each account, hoping you got the GUIDs right

After — all accounts in one command:

# Import everything (defaults to ~/Library/Mail)
msgvault import-emlx

# Or with explicit path
msgvault import-emlx ~/Library/Mail

# Filter to just one account
msgvault import-emlx --account me@gmail.com

# Manual fallback still works for non-V10 layouts
msgvault import-emlx ~/Downloads/old-mail/INBOX.mbox --identifier me@gmail.com

Example output from auto-discovery:

Discovered 4 account(s):
  - alice@gmail.com (Google)
  - On My Mac
  - alice@yahoo.com (Yahoo!)
  - alice@icloud.com (iCloud)

Importing alice@gmail.com (Google)...
Import complete.
  Mailboxes:      3 discovered, 3 imported
  Processed:      809 messages
  Added:          809 messages
  ...

Importing alice@yahoo.com (Yahoo!)...
Import complete.
  Mailboxes:      13 discovered, 13 imported
  Processed:      10908 messages
  Added:          10907 messages
  ...

=== Grand Total ===
  Mailboxes:      52 discovered, 52 imported
  Processed:      39038 messages
  Added:          34590 messages
  Updated:        0 messages
  Skipped (dup):  4448 messages
  Errors:         0

How it works

  1. Scans mail-dir for V*/ directories containing UUID subdirectories
  2. Opens Accounts4.sqlite read-only and resolves each GUID using COALESCE(child.ZUSERNAME, parent.ZUSERNAME) to handle IMAP child accounts that inherit email from their parent
  3. Imports each account sequentially, printing per-account summaries and a grand total
  4. Accounts without an email (e.g. "On My Mac") use their description as the identifier
  5. GUIDs not found in the accounts DB are skipped with a warning

schlabrendorff and others added 2 commits March 2, 2026 19:32
…ectories

Apple Mail V10 stores large mailboxes by splitting .emlx files across
numeric partition subdirectories (0-9) nested under Data/ at arbitrary
depths (e.g., Data/0/3/Messages/123.emlx). Previously, only the
top-level Data/Messages/ directory was scanned, missing the majority of
messages in partitioned mailboxes.

Changes:
- Add FileIndex map to Mailbox struct for resolving partition file paths
- Add FilePath() method for transparent path resolution
- Add recursive partition discovery (hasEmlxFilesInPartitions,
  collectPartitionFiles) with isDigitDir/isEmlxFile helpers
- Handle partition-only layouts where Data/Messages/ doesn't exist
- Use FilePath() in emlx importer instead of direct path join

Before: 12 mailboxes, 31k files (top-level Messages/ only)
After:  52 mailboxes, 39k files (all partition depths)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add auto-discovery of Apple Mail accounts by reading the GUID-to-email
mapping from ~/Library/Accounts/Accounts4.sqlite. The import-emlx command
now accepts optional [mail-dir] (defaults to ~/Library/Mail) and discovers
accounts automatically, eliminating the need to manually specify email
identifiers for V10 directory GUIDs.

New flags: --accounts-db, --account (filter), --identifier (manual fallback).
New package: internal/applemail with ResolveAccounts and DiscoverV10Accounts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link

roborev-ci bot commented Mar 4, 2026

roborev: Combined Review (baf2124)

**
Verdict:** The PR successfully implements Apple Mail V10 account auto-discovery, but introduces medium-severity issues regarding directory selection logic and CLI backward compatibility.

Medium

Stale Data Import Risk
File: [accounts.go:181](/home/roborev/repos/msg
vault/internal/applemail/accounts.go:181)
V10AccountDir returns the first V* match for a GUID. If the same GUID exists in multiple version directories (e.g., V2, V10, V11), the
import source selection is lexicographic rather than selecting the latest/active version, which may result in importing stale data.
Suggested fix: Parse the numeric version from the V<digits> directory names and select the highest version containing the GUID, or explicitly constrain the selection to the intended version.

CLI Backward Compatibility Regression

File: import_emlx.go:54
The CLI argument contract changed from requiring <identifier> <mail-dir> to accepting [mail -dir] with an --identifier flag, now enforcing MaximumNArgs(1). Existing scripts utilizing the old two-argument format will hard-fail, causing a functional regression for upgrading users.
Suggested fix: Support both forms for a transition period (e.g., using RangeArgs( 0,2)), map legacy arguments to identifier and mailDir, and emit a deprecation warning.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant